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On distributed convex optimization under 
inequality and equality constraints via 
primal-dual subgradient methods 



Abstract 

We consider a general multi-agent convex optimization problem where the agents are to collectively 
minimize a global objective function subject to a global inequality constraint, a global equality constraint, 
and a global constraint set. The objective function is defined by a sum of local objective functions, while 
the global constraint set is produced by the intersection of local constraint sets. In particular, we study 
two cases: one where the equality constraint is absent, and the other where the local constraint sets 
^ . are identical. We devise two distributed primal-dual subgradient algorithms which are based on the 



characterization of the primal-dual optimal solutions as the saddle points of the Lagrangian and penalty 
functions. These algorithms can be implemented over networks with changing topologies but satisfying 
a standard connectivity property, and allow the agents to asymptotically agree on optimal solutions and 
optimal values of the optimization problem under the Slater's condition. 

I. Introduction 

Recent advances in sensing, communication and computation technologies are challenging the 
way in which control mechanisms are designed for their efficient exploitation in a coordinated 
manner. This has motivated a wealth of algorithms for information processing, cooperative 
control, and optimization of large-scale networked multi-agent systems performing a variety 
of tasks. Due to a lack of a centralized authority, the proposed algorithms aim to be executed 
by individual agents through local actions, with the main feature of being robust to dynamic 
changes of network topologies. 

The authors are with Department of Mechanical and Aerospace Engineering, University of California, San Diego, 9500 Oilman 
Dr, La Jolla CA, 92093, {mizhu, soni amd} @ u c s d . e du 
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In this paper, we consider a general multi-agent optimization problem where the goal is to 
minimize a global objective function, given as a sum of local objective functions, subject to 
global constraints, which include an inequality constraint, an equality constraint and a (state) 
constraint set. Each local objective function is convex and only known to one particular agent. 
On the other hand, the inequality (resp. equality) constraint is given by a convex (resp. affine) 
function and known by all agents. Each node has its own convex constraint set, and the global 
constraint set is defined as their intersection. This problem is motivated by others in distributed 
estimation ll24l ll30l . distributed source localization ll28l . network utility maximization [15], 
optimal flow control in power systems [|26ll . [[33l and optimal shape changes of mobile robots [9J. 
An important feature of the problem is that the objective and (or) constraint functions depend 
upon a global decision vector. This requires the design of distributed algorithms where, on the 
one hand, agents can align their decisions through a local information exchange and, on the 
other hand, the common decisions will coincide with an optimal solution and the optimal value. 

Literature Review. In [j2l| and ll32ll . the authors develop a general framework for parallel and 
distributed computation over a set of processors. Consensus problems, a class of canonical 
problems on networked multi-agent systems, have been intensively studied since then. A neces- 
sarily incomplete list of references includes [fTTIl . Il25l tackling continuous-time consensus, fS^, 
lfT2l . ifTSl investigating discrete-time versions, and [fTTl where asynchronous implementation of 
consensus algorithms is discussed. The papers lIH, [|T4l|. [|3T|| treat randomized consensus via 
gossip communication, achieving consensus through quantized information and consensus over 
random graphs, respectively. The convergence rate of consensus algorithms is discussed, e.g., 
in ll27l . Il34l . and the author in derives conditions to achieve different consensus values. 

In robotics and control communities, convex optimization has been exploited to design algo- 
rithms coordinating mobile multi-agent systems. In [8j, in order to increase the connectivity of 
a multi-agent system, a distributed supergradient-based algorithm is proposed to maximize the 
second smallest eigenvalue of the Laplacian matrix of the state dependent proximity graph of 
agents. In [[9]|, optimal shape changes of mobile robots are achieved through second-order cone 
programming techniques. In fTOl, a target tracking problem is addressed by means of a generic 
semidefinite program where the constraints of network connectivity and full target coverage are 
articulated as linear-matrix inequalities. In [fT9ll , in order to attain the highest possible positioning 
accuracy for mobile robots, the authors express the covariance matrix of the pose errors as a 
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functional relation of measurement frequencies, and then formulate a optimal sensing problem 
as a convex programming of measurement frequencies. 

The recent papers [|2T]| . ||23l are the most relevant to our work. In [1211 . the authors solve a 
multi-agent unconstrained convex optimization problem through a novel combination of average 
consensus algorithms with subgradient methods. More recently, the paper ll23ll further takes 
local constraint sets into account. To deal with these constraints, the authors in ll23l present an 
extension of their distributed subgradient algorithm, by projecting the original algorithm onto the 
local constraint sets. Two cases are solved in ll23l : the first assumes that the network topologies 
can dynamically change and satisfy a periodic strong connectivity assumption (i.e., the union 
of the network topologies over a bounded period of time is strongly connected), but then the 
local constraint sets are identical; the second requires that the communication graphs are (fixed 
and) complete and then the local constraint sets can be different. Another related paper is [|13I 
where a special case of [23], the network topology is fixed and all the local constraint sets are 
identical, is addressed. 

Statement of Contributions. Building on the work ||23l, this paper further incorporates global 
inequality and equality constraints. More precisely, we study two cases: one in which the equality 
constraint is absent, and the other in which the local constraint sets are identical. For the first 
case, we adopt a Lagrangian relaxation approach, define a Lagrangian dual problem and devise 
the distributed Lagrangian primal-dual subgradient algorithm (DLPDS, for short) based on the 
characterization of the primal-dual optimal solutions as the saddle points of the Lagrangian 
function. The DLPDS algorithm involves each agent updating its estimates of the saddle points 
via a combination of an average consensus step, a subgradient (or supgradient) step and a 
primal (or dual) projection step onto its local constraint set (or a compact set containing the dual 
optimal set). The DLPDS algorithm is shown to asymptotically converge to a pair of primal-dual 
optimal solutions under the Slater's condition and the periodic strong connectivity assumption. 
Furthermore, each agent asymptotically agrees on the optimal value by implementing a dynamic 
average consensus algorithm developed in ll35l . which allows a multi-agent system to track 
time-varying average values. 

For the second case, to dispense with the additional equality constraint, we adopt a penalty 
relaxation approach, while defining a penalty dual problem and devising the distributed penalty 
primal-dual subgradient algorithm (DPPDS, for short). Unlike the first case, the dual optimal 
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set of the second case may not be bounded, and thus the dual projection steps are not involved 
in the DPPDS algorithm. It renders that dual estimates and thus (primal) subgradients may not 
be uniformly bounded. This challenge is addressed by a more careful choice of step-sizes. We 
show that the DPPDS algorithm asymptotically converges to a primal optimal solution and the 
optimal value under the Slater's condition and the periodic strong connectivity assumption. 

For the special case where the global inequality and equality constraints are not taken into 
account, this paper extends the results in ll23l to a more general scenario where the network 
topologies satisfy the periodic strong connectivity assumption, and the local constraint sets can 
be different, while relaxing an interior-point condition requirement. We refer the readers to 
Section IVI-DI for additional information. 

II. Problem formulation and assumptions 

A. Problem formulation 

Consider a network of agents labeled by V := {1, . . . , A^} that can only interact with each 
other through local communication. The objective of the multi-agent group is to cooperatively 
solve the following optimization problem: 

N 

min/(x) := s.t. ^(x) < 0, h{x)=0, x e X := n^^^X^^ , (1) 

1=1 

where /W : R" — )■ R is the convex objective function of agent i, C M" is the compact 
and convex constraint set of agent i, and a; is a global decision vector. Assume that /W and 
X^*] are only known by agent i, and probably different. The function g : R'" is known 

to all the agents with each component gi, for i E {1, . . . ,fn}, being convex. The inequality 
g{x) < is understood component- wise; i.e., ge{x) < 0, for all £ G {1, . . . ,m}, and represents 
a global inequality constraint. The function /i : R" — )• R*^, defined as h{x) := Ax — b with 
A E R''^", represents a global equality constraint, and is known to all the agents. We denote 
F := {a; G R" I g{x) < 0, h{x) = 0}, and assume that the set of feasible points is non-empty; 
i.e., XflF 7^ 0. Since X is compact and Y is closed, then we can deduce that XflF is compact. 
The convexity of implies that of / and thus / is continuous. In this way, the optimal value p* 
of the problem ^ is finite and X*, the set of primal optimal points, is non-empty. Throughout 
this paper, we suppose the following Slater's condition holds: 
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Assumption 2.1 (Slater's Condition): There exists a vector x E X such that g{x) < and 
h{x) = 0. And there exists a relative interior point x of X, i.e., x E X and there exists an open 
sphere S centered at x such that S fl aff(X) C X with aff (X) being the affine hull of X, such 
that h{x) = 0. 

Remark 2.1: In this paper, the quantities (e.g., functions, scalars and sets) associated with 
agent i will be indexed by the superscript [i]. 

In this paper, we will study two particular cases of problem ([I]): one in which the global 
equality constraint h{x) = is not included, and the other in which all the local constraint sets 
are identical. For the case where the constraint h{x) = is absent, the Slater's condition 12.11 
reduces to the existence of a vector x E X such that g{x) < 0. 

B. Network model 

We will consider that the multi-agent network operates synchronously. The topology of the 
network at time > will be represented by a directed weighted graph Q{k) = (V, E{k), A(k)) 
where A(k) := [a^j{k)] E M^^^ is the adjacency matrix with a* (fc) > being the weight assigned 
to the edge (j, i) and E{k) C Vx\/\diag(l^) is the set of edges with non-zero weights a^j{k). The 
in-neighbors of node i at time k are denoted by J\f^'^^(k) = {j E V \ (j, i) E E{k) and j ^ %]. We 
here make the following assumptions on the network communication graphs, which are standard 
in the analysis of average consensus algorithms; e.g., see [|25l . [|27ll . and distributed optimization 
in lEH, II231. 

Assumption 2.2 (Non-degeneracy): There exists a constant a > such that a\{k) > a, and 
a^j{k), for i ^ j, satisfies a*(A;) E {0} U [a, 1], for all A; > 0. 

Assumption 2.3 (Balanced Communication): Q It holds that YlJ=i ci){k;) = 1 for all i E V 
and A; > 0, and J2f=i = 1 for all j E V and k > 0. 

Assumption 2.4 (Periodical Strong Connectivity): There is a positive integer B such that, 
for all A;o > 0, the directed graph (V, |jf=o^ E{kQ + k)) is strongly connected. 

C. Notion and notations 

The following notion of saddle point plays a critical role in our paper. 

'it is also referred to as double stochasticity. 
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Definition 2.1 (Saddle point): Consider a function : X x M — )■ M where X and M are 
non-empty subsets of M" and M™. A pair of vectors E X x M is called a saddle point 

of (p over X X M if ^(x*,/^) < ^(s*,/^*) < 0(a;,Ai*) hold for all G X x M. 

Remark 2.2: Equivalently, (x*, is a saddle point of over X x M if and only if (x*, G 
X X M, and sup^g^/ /i) < < inf^gx /W*). • 

In this paper, we do not assume the differentiability of and ge- At the points where the 
function is not differentiable, the subgradient plays the role of the gradient. For a given convex 
function F : M" — )• M and a point x G M", a subgradient of the function F at a; is a vector 
VF{x) G M" such that the following subgradient inequality holds for any x G M": 

VF{xf{x -x) < F{x) - 

Similarly, for a given concave function G : — t- M and a point /2 G M'", a supgradient 
of the function G at /i is a vector VG{fi) G M™ such that the following supgradient inequality 
holds for any ^ G M"*: 

VG{fi)^{fi -Ji)> - G(/2). 

Given a set S", we denote by co{S) its convex hull. We let the function [■]+ : R™ — )• ]R>q 
denote the projection operator onto the non-negative orthant in M™. For any vector c G M", we 
denote |c| := (|ci|, ■ ■ ■ , |c.ra|)"^, while || ■ || is the 2-norm in the Euclidean space. 

III. Case (i): absence of equality constraint 

In this section, we study the case of problem ([T]) where the equality constraint h{x) = is 
absent; i.e., problem ([T]) becomes 

N 

min^/[^](x), s.t. ^(a;)<0, xer\f^^X^^. (2) 
i=i 

We first provide some preliminaries, including a Lagrangian saddle-point characterization of 
problem (|2l) and finding a superset containing the Lagrangian dual optimal set of problem (O. 
After that, we present the distributed Lagrangian primal-dual subgradient algorithm and summa- 
rize its convergence properties. 

A. Preliminaries 

We here develop some preliminary results which are essential to the design of the distributed 
Lagrangian primal-dual subgradient algorithm. 
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1) A Lagrangian saddle-point characterization: Firstly, problem Q is equivalent to 

min fix), s.t. Ngix) < 0, x G X, 

with associated Lagrangian dual problem given by 

maxgi(u), s.t. w > 0. 

Here, the Lagrangian dual function, ■ M>q — )■ M, is defined as gL(/i) := inf^-gx ^{x, where 
£ : X M>Q — )• M is the Lagrangian function C{x,fi) = f{x) + Nfi^g{x). We denote the 
Lagrangian dual optimal value of the Lagrangian dual problem by rf^ and the set of Lagrangian 
dual optimal points by D*^. As is well-known, under the Slater's condition 12.11 the property of 
strong duality holds; i.e., p* = d}^, and D*^ ^ 0. The following theorem is a standard result 
on Lagrangian duality stating that the primal and Lagrangian dual optimal solutions can be 
characterized as the saddle points of the Lagrangian function. 

Theorem 3.1 (Lagrangian Saddle-point Theorem (31): The pair of {x*,fi*) e X x ]R>q is 
a saddle point of the Lagrangian function C over X x ]R>q if and only if it is a pair of primal 
and Lagrangian dual optimal solutions and the following Lagrangian minimax equality holds: 

sup inf C{x,n) = inf sup 

This following lemma presents some preliminary analysis of Lagrangian saddle points. 
Lemma 3.1 (Preliminary results of Lagrangian saddle points): Let M be any superset of 
Dl. 

(a) If is a saddle point of C over X x M>0' '^hen is also a saddle point of C 
over X X M. 

(b) There is at least one saddle point of C over X x M. 

(c) If (x, /i) is a saddle point of C over X x M, then C{x, fi) = p* and fi is Lagrangian dual 
optimal. 

Proof: (a) It just follows from the definition of saddle point of C over X x M. 
(b) Observe that 

sup inf C{x,fi) = sup = d^, 

inf sup = inf /(a;)=p*. 

^gRm^ xexnY 
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Since the Slater's condition 12.11 implies zero duality gap, the Lagrangian minimax equality holds. 
From Theorem 13.11 it follows that the set of saddle points of £ over X x M>g is the Cartesian 
product X* X D^. Recall that X* and D*^ are non-empty, so we can guarantee the existence of 
the saddle point of C over X x R™g. Then by (a), we have that (b) holds. 

(c) Pick any saddle point of C over X x M>q. Since the Slater's condition 12.11 holds . 

from Theorem 13.11 one can deduce that {x*, fx*) is a pair of primal and Lagrangian dual optimal 
solutions. This implies that 

d*j^ = inf C{x,fi*) < C{x* , fi*) < sup C{x* , fi) = p* . 

From Theorem 13. 1[ we have rf^ = p*- Hence, C{x*,ji*) = p*. On the other hand, we pick 
any saddle point (x, jl) of C over X x M. Then for all x G X and fi G M, it holds that 



C{x,iJ,) < C{x,fl) < C{x,fl). By Theorem [3711 then jj,* e Dl C M. Recall x* G X, and thus 
we have C{x, fi*) < C{x, fi) < C{x*,fi). Since x E X and fi G ]R>o, we have C{x*,jj,) < 
C{x*,fi*) < C{x,fi*). Combining the above two relations gives that C{x,jl) = C{x*,fi*) = p*. 
From Remark [2721 we see that C{x, fl) < ini ^(zx jC{x, fl) = qiifi)- Since C{x,jl) = p* = d\> 
QLifJ'), then gL(/i) = c^l thus /i is a Lagrangian dual optimal solution. ■ 

Remark 3.1: Despite that (c) holds, the reverse of (a) may not be true in general. In particular, 
X* may be infeasible; i.e., ge{x*) > for some £ E {1, . . . , m}. • 

2) A upper estimate of the Lagrangian dual optimal set: In what follows, we will find a 
compact superset of D]^. To do that, we define the following primal problem for each agent i: 

min f^ix), s.t. gix) < 0, xE X^. 

Due to the fact that is compact and the are continuous, the primal optimal value p* of 
each agent's primal problem is finite and the set of its primal optimal solutions is non-empty. 
The associated dual problem is given by 

maxg'*l(u), s.t. w > 0. 

Here, the dual function : R>q — )■ M is defined by := inf^^xM £'*'(a;, /x), where £^^1 : 

M" X W^Q — )• R is the Lagrangian function of agent i and given by /i) = /'*'(a;) + fs^ g{x). 

The corresponding dual optimal value is denoted by d*. In this way, C is decomposed into a 
sum of local Lagrangian functions; i.e., jC{x,(x) = J2iLi £'*'(^,/^)- 
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Define now the set-valued map Q : M^q ^ 2(^>o) by Q{fi) = {fi E M^q I qM > qiifi')}- 
Additionally, define a function 7 : X — )■ M by 7(0;) = mm£^{i^ m}{—ge{x)}- Observe that if x 
is a Slater vector, then 7(0;) > 0. The following lemma is a direct result of Lemma 1 in [|20l . 

Lemma 3.2 (Boundedness of dual solution sets): The set Q{p.) is bounded for any fl G 

]R>Q, and, in particular, for any Slater vector x, it holds that max^gQ(^) < ■^:^{f {x) — qL{pi)) ■ 
□ 

Notice that D*^ = E ]R>o | qL{^^) > d*^}. Picking any Slater vector x E X, and letting 
ji = E D\m Lemma 13.21 gives that 

max ||/.*||<-^(/(x)-rf^). (3) 

Define the function r : X x W^^ ^ M U {+00} by r(x,/i) := maXigy{/W(x) - q^^{fi)}. 
By the property of weak duality, it holds that d* < p* and thus > for any (x, fi) E 

X X M™Q. Since "^{x) > 0, thus r{x, yu) ^ for any /i E IR>o- With this observation, we pick any 
/i E M^o and the following set is well-defined: MW(x,/i) := {/x E M^q I ll/^ll < + Q^^) 

for some G M>o. Observe that for all G M^o^ 

Af iV Af 

«=i 1=1 1=1 

Since d*L > qiilj), it follows from © and © that 

1 1 ^ 



max _ _ 



j=i 



< -^max{/[^l(x) = r(a;,/i). 

Hence, we have D\ C MH(x,/i) for all i E V. 

Note that in order to compute (x, /i), all the agents have to agree on a common Slater vector 
X E n^j^XW which should be obtained in a distributed fashion. To handle this difficulty, we 
now propose a distributed algorithm, namely Distributed Slater-vector Computation Algorithm, 
which allows each agent i to compute a superset of MW(x, /i). 

Initially, each agent i chooses a common value ji E M>q; e.g., /i = 0, and computes two 
positive constants 61*1(0) and cW(0) such that 61^1(0) > sup^.gjw{/[*l(x) - g[*l(/i)} and < 
mini<^<minf^gjH{-5(^(x)} where := {x E XH | g(x) < 0}. 
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At every time A; > 0, each agent i updates its estimates by using the following rules: 

¥^'{k + 1) = max 

j£jVM(k)u{i} jeA^W(fc)u{i} 

Lemma 3.3 (Convergence properties of the distributed Slater- vector Computation Algorithm): 

Assume that the periodical strong connectivity assumption 12.41 holds. Consider the sequences 
of {^^(A;)} and {c^''^(k)} generated by the Distributed Slater- vector Computation Algorithm. It 
holds that after at most (A^ — 1)B steps, all the agents reach the consensus, i.e., 6W(A;) = b* : = 
6[^1(0) and S{k) = c* : = min^gy ct-'l(O) for all k > {N — 1)B. Furthermore, we have 
MW(/i) := {fi e M^o I < ^ + 0^^} ^ M^^ix,fi) for i G V. 

Proof: It is not difficult to verify achieving max-consensus and min-consensus by using 
the periodical strong connectivity assumption 12. 4[ Note that J := {x E X \ g{x) < 0} C jW, 
\/i G V. Hence, we have 

maxsup{/[^l(a;) - q^^{jl)} < max sup {f^^{x) - q^^{jl)} < b*, 

inf min {—g£(x)} > min inf min {—gAx)} > c* . 

Since x E J, then the following estimate on r{x,fl) holds: 

^ Arsup,gjmax,gy{/H(x)-gH(^)} ^ iV6* 

~ mi^^jmmi<i<^{-ge{x)} ~ c* 
The desired result immediately follows. ■ 

From Lemma [33] and the fact that D*^ C M^^^{x,jl), we can see that the set of M(/i) : = 
n^iMW(/i) contains D^. In addition, MW(/i) and M{jl) are non-empty, compact and convex. 
To simplify the notations, we will use the shorthands := MW(/i) and M := M{jl). 

3) Convexity of C: For each fx E M>q, we define the function c]^ : M" — M as C^^{x) : = 
C''^^{x,fi). Note that C]^ is convex since it is a nonnegative weighted sum of convex functions. 
For each x E M", we define the function Cx-^ : ]R>q — )► M as := It is easy to 

check that Cx^ is a concave (actually affine) function. Then the Lagrangian function C is the 
sum of a collection of convex-concave local functions. This property motivates us to significantly 
extend primal-dual subgradient methods in [[U, [|22ll to the networked multi-agent scenario. 

B. Distributed Lagrangian primal-dual subgradient algorithm 

Here, we introduce the Distributed Lagrangian Primal-Dual Subgradient Algorithm (DLPDS, 
for short) to find a saddle point of the Lagrangian function C over X x M and the optimal value. 
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This saddle point will coincide with a pair of primal and Lagrangian dual optimal solutions which 
is not always the case; see Remark 13.11 

Through the algorithm, at each time k, each agent i maintains the estimate of (x^^^ (k) , fi^^^ (k)) 
to the saddle point of the Lagrangian function C over X x M and the estimate of y ^ (^k^ to p*. To 
produce a;W(A; + 1) (resp. + 1)), agent i takes a convex combination Vx\k) (resp. of 
its estimate x^^^{k) (resp. with the estimates sent from its neighboring agents at time k, 

makes a subgradient (resp. supgradient) step to minimize (resp. maximize) the local Lagrangian 
function £W, and takes a primal (resp. dual) projection onto the local constraint (resp. M^). 
Furthermore, agent i generates the estimate ?/W(A; + 1) by taking a convex combination Vy\k) of 
its estimate y^^^{k) with the estimates of its neighbors at time k and taking one step to track the 
variation of the local objective function /I'l The DLPDS algorithm is formally stated as follows: 

Initially, each agent i picks a common fx E R>q and computes the set Af with some > 
by using the Distributed Slater-vector Computation Algorithm. Furthermore, agent i chooses any 
initial state 2;[^1(0) G XW, fi^'^{0) e W^q, and ?/W(l) = N f^'^ {x^^ (0)) . 

At every A; > 0, each agent i generates a;W(/c + 1), /^^(A; + 1) and y^^^{k + l) according to the 
following rules: 

N N N 

v!\k) = J2<^]ik)x^Hk), v^^k) = $^a}(A;)/ib](A;), vf{k) = J2<^){k)y^\k), 

j=i j=i j=i 

x^\k + 1) = PM^f{k) - a(fc)pW(fc)], ^^%k + 1) = P,n^[vf{k) + a{k)V^^{k)l 
+ 1) = vf{k) + X(/W(a;['l(fc)) - - 1))), 

where Px[i\ (resp. P/v/i^i) is projection operator onto the set (resp. M^), the scalars 
a* (/c) are non-negative weights and the scalars q;(/c) > are step-size]^. We use the shorthands 
V^ik) ^ V6j,^^^{vf{k)), and V^^{k) ^ V ^^^{vf {k)) . 

The following theorem summarizes the convergence properties of the DLPDS algorithm where 
agents asymptotically agree upon a pair of primal-dual optimal solutions. 

Theorem 3.2 (Convergence properties of the DLPDS algorithm): Consider the optimiza- 
tion problem Let the non-degeneracy assumption 12. 2[ the balanced communication assump- 
tion 12.31 and the periodic strong connectivity assumptions 12.41 hold. Consider the sequences of 

^Each agent i executes the update law of y'''(fe) for k>l. 
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{xW(A;)}, {;uW(A;)} and {y^^^{k)} of the distributed Lagrangian primal-dual subgradient algorithm 

+00 +00 

with the step-sizes {a{k)} satisfying lim a{k) = 0, ^.'^{k) = +00, and 2,'^{k)^ < +00. 



Then, there is a pair of primal and Lagrangian dual optimal solutions (x*, /i*) G X* x such 
that lim ||x^*^(A;) — x*\\ =0 and lim ||/i'*'(A;) — =0 for all i eV. Furthermore, we have 
that lim \\v^'^(k) - = for all i e V. 



Remark 3.2: For a convex-concave function, continuous-time gradient-based methods are 
proved in [IJ to converge globally towards the saddle-point. Recently, ll22l presents (discrete- 
time) primal-dual subgradient methods which relax the differentiability in [[U and further incor- 
porate state constraints. The method in [IJ is adopted by ni6i and [|29ll to study a distributed 
optimization problem on fixed graphs where objective functions are separable. 

The DLPDS algorithm is a generalization of primal-dual subgradient methods in [[22| to the 
networked multi-agent scenario. It is also an extension of the distributed projected subgradient 
algorithm in [23 J to solve multi-agent convex optimization problems with inequality constraints. 
Additionally, the DLPDS algorithm enables agents to find the optimal value. Furthermore, the 
DLPDS algorithm objective is that of reaching a saddle point of the Lagrangian function in 
contrast to achieving a (primal) optimal solution in [23J. • 



In last section, we study the case where the equality constraint is absent in problem O. In 
this section, we turn our attention to another case of problem ([T]) where h{x) = is taken into 
account but we require that local constraint sets are identical; i.e., = X for all i E V. We 
first adopt a penalty relaxation and provide a penalty saddle-point characterization of primal 
problem ([T]) with = X. We then introduce the distributed penalty primal-dual subgradient 
algorithm, followed by its convergence properties and some remarks. 

A. Preliminaries 

Some preliminary results are presented in this part, and these results are essential to the 
development of the distributed penalty primal-dual subgradient algorithm. 





fc— >+oo 



IV. Case (ii): identical local constraint sets 
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1) A penalty saddle-point characterization: Note that the primal problem ([T]) with = X 
is trivially equivalent to the following: 

min/(x), s.t. Ng{x)<0, Nh{x) = 0, x e X, (5) 

with associated penalty dual problem given by 

max qp{fi,\), s.t. > 0, A > 0. (6) 



Here, the penalty dual function, qp : M>gX]R>Q — )■ M, is defined by gp(/i, A) := ini^^x 'H(x, yU, A), 
where "H : M" x M>q x M>q — M is the penalty function given by 'H(x,yU, A) = /(x) + 
X/i^[(7(x)]+ + A^A^|/;,(a;)|. We denote the penalty dual optimal value by d*p and the set of penalty 
dual optimal solutions by D*p. We define the penalty function (x, /i, A) : x W^q x M^q M 
for each agent i as follows: '^^(x, /i, A) = /'*'(x) + fi^[g{x)]'^ + X^lh^x)]. In this way, we have 
that 'H(x,/i, A) = Xlili ^'*'(^' '^)- proven in the next lemma, the Slater's condition 12.11 
ensures zero duality gap and the existence of penalty dual optimal solutions. 

Lemma 4.1 (Strong duality and non-emptyness of the penalty dual optimal set): The val- 
ues of p* and d*p coincide, and Dp is non-empty. 

Proof: Consider the auxiliary Lagrangian function ^ IR" x lR>o x M*^ — R given by 
Ca{x, /X, A) = f{x) + Np^g{x) + N\^h{x), with the associated dual problem defined by 

max A), s.t. yU > 0. (7) 



Here, the dual function, : R>q x M'' — )■ M, is defined by ga(/W, A) := inf^gx ^aix, n, A). The 
dual optimal value of problem (|7]) is denoted by d* and the set of dual optimal solutions is 
denoted D^. Since X is convex, / and gi, for i E {1, . . . , m}, are convex, p* is finite and the 
Slater's condition [IH] holds, it follows from Proposition 5.3.5 in Q that p* = dl and D* 7^ 0. 
We now proceed to characterize d*p and D*p. Pick any (/i*, A*) G -D*. Since p* > 0, then 

< = A*) = inf {/(x) + iV(/i*)^^(x) + iV(A*)^/i(x)} 

< inf {/(x) + iV(^*)^[^?(x)]+ + N\X*\^\hix)\} = qpifi*, \X*\) < d*p. (8) 

On the other hand, pick any x* G X*. Then x* is feasible, i.e., x* G X, [g{x*)]~^ = and 
\h{x*)\ = 0. It implies that gp(/i,A) < 'H(x*,/i,A) = /(x*) = holds for any G M^q and 
A G M>Q, and thus d*p = sup^gj^m Qp{i^, A) < p* = ci*. Therefore, we have d*p = p*. 
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To prove the emptyness of D*p, we pick any (/i*, A*) G -D*. From ([8]) and dl = d*p, we can 
see that {fi*,\X*\) e D*p and thus D*pj^(/}. U 

The following is a slight extension of Theorem 13.11 to penalty functions. 

Theorem 4.1 (Penalty Saddle-point Theorem): The pair of {x*,fi*, A*) is a saddle point of 
the penalty function T-L over X x M>g x ]R>q if and only if it is a pair of primal and penalty 
dual optimal solutions and the following penalty minimax equality holds: 

sup inf ^{(x, /i, A) = inf sup ^{(x,/^, A). 



Proof: The proof is analogous to that of Proposition 6.2.4 in and for the sake of 
completeness, we provide the details here. It follows from Proposition 2.6.1 in [4] that (x*, /i*, A*) 
is a saddle point of 1-L over X x ]R>q x M>q if and only if the penalty minimax equality holds 
and the following conditions are satisfied: 

sup ^^(x*, /i, A) = min{ sup 'H(x,/i, A)}, (9) 

inf H(x, u*, A*) = max | inf "Hfx, u, A)|. (10) 

Notice that infa;gx 'H(x, yU, A) = gp(yU, A); and if x G F, then sup(^ jj^^gjgm^xR^,, 'H(x, /i, A) = /(x), 
otherwise, sup^^ ;^)g]fjm ^jj^^^ ^(x, /i, A) = +oo. Hence, the penalty minimax equality is equivalent 
to d*p = p*. Condition ^ is equivalent to the fact that x* is primal optimal, and condition (fTOl) 
is equivalent to (/i*, A*) being a penalty dual optimal solution. ■ 

2) Convexity of 1-L: Since gi is convex and [■]+ is convex and non-decreasing, thus [5f^(x)] + 
is convex in x for each £ G {1, . . . , m}. Denote A := ( t) . Since I ■ I is convex and 

ajx — bi is an affine mapping, then \ajx — bg] is convex in x for each i E {!,...,//}. 

We denote w := (yU, A). For each w G M^q x I^>0' we define the function : W R 
as V.w{x) := ^/^(x, w). Note that 'HS(x) is convex in x by using the fact that a nonnegative 
weighted sum of convex functions is convex. For each x G M", we define the function Wx ■ 
M^o X ]R|o ^ M as nl^\w) := V}^{x,w). It is easy to check that 'HI\w) is concave (actually 
affine) in w. Then the penalty function 1-L{x,w) is the sum of convex-concave local functions. 

Remark 4.1: The Lagrangian relaxation does not fit to our approach here since the Lagrangian 
function is not convex in x by allowing A entries to be negative. • 
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B. Distributed penalty primal-dual subgradient algorithm 

We are now in the position to devise the Distributed Penalty Primal-Dual Subgradient Al- 
gorithm (DPPDS, for short), that is based on the penaky saddle-point theorem 14.11 to find the 
optimal value and a primal optimal solution to primal problem ([T]) with = X. The DPPDS 
algorithm is formally described as follow. 

Initially, agent i chooses any initial state xH(0) E X, /iH(0) E W^q, \^^{0) E M^Q' and 
yW(l) = A^/W(xW(0)). At every time k > 0, each agent i computes the following convex 
combinations: 

N N 

N N 

and generates + 1), + 1), + 1) and AW(A; + 1) according to the following rules: 

+ 1) = Px[vl^\k) - a{k)S^\k)], y^\k + 1) = v^^k) + Nif^\x^\k)) - f^\x^\k - 1))), 
/iH(A; + 1) = vf{k) + a{k)\g{v^^{k))\\ \^\k + 1) = vf{k) + a{k)\h{v^^{k))l 

where Px is the projection operator onto the set X, the scalars a^^ik) are non-negative weights 
and the positive scalars are step-sizej^. The vector 

Sf{k) := Vf^\vl\k)) + Y.vf{k),V[geivl\k)r + Y.''fik)eV\he\{vl\k)) 

e=i 1=1 

is a subgradient of "^[^i] j-^) (2^) at x = v§{k) where wW(A;) := {v^l}{k)^v^^{k)) is the convex 

combination of dual estimates of agent i and its neighbors'. 

Given a step-size sequence {a{k)}, we define its sum over [0, k] by s{k) ■= Eto "(^) and 

assume that: 

Assumption 4.1 (Step-size assumption): The step-sizes satisfy lim a{k) = 0, ^t=o'^ik) = +00, 
Tl=o<kf < +00, and lim + l)s(A;) = 0, j:t=o(^ik + ifsik) < +00, J2t=o(^ik + 

l)h{k)^ < +00. 

^Each agent i executes the update law of for k > 1. 



DRAFT 



16 



The following theorem is the main result of this section, characterizing convergence properties 
of the DPPDS algorithm where a optimal solution and the optimal value are asymptotically agreed 
upon. 

Theorem 4.2 (Convergence properties of the DPPDS algorithm): Consider the problem ([T]) 
with = X. Let the non-degeneracy assumption 12.21 the balanced communication assump- 
tion 12.31 and the periodic strong connectivity assumption 12.41 hold. Consider the sequences 
of {xW(A;)} and of the distributed penalty primal-dual subgradient algorithm where 

the step-sizes {a{k)} satisfy the step-size assumption 14.11 Then there exists a primal opti- 
mal solution X G X* such that lim ||x'*'(A;) — x|| = for all i & V. Furthermore, we have 
lim \\y^^{k) - = for all i e V. 

We here provide some remarks to conclude this section,. 

Remark 4.2: As primal-dual (sub)gradient algorithm in [[T|, [|22l. the DPPDS algorithm pro- 
duces a pair of primal and dual estimates at each step. Main differences include: firstly, the 
DPPDS algorithm extends the primal-dual subgradient algorithm in [22] to the multi-agent 
scenario; secondly, it further takes the equality constraint into account. The presence of the 
equality constraint can make D*p unbounded. Therefore, unlike the DLPDS algorithm, the DPPDS 
algorithm does not involve the dual projection steps onto compact sets. This may cause the 
subgradient Sx\k) not to be uniformly bounded, while the boundedness of subgradients is a 
standard assumption in the analysis of subgradient methods, e.g., see [[3]|, flUl, EOl . [|2T|. [|22||. 
[|23l . This difficulty will be addressed by a more careful choice of the step-size policy; i.e, 
assumption 14. 1[ which is stronger than the more standard diminishing step-size scheme, e.g., in 
the DLPDS algorithm and ll23l . We require this condition in order to prove, in the absence of 
the boundedness of {Sx\k)}, the existence of a number of limits and summability of expansions 
toward Theorem 14.21 Thirdly, the DPPDS algorithm adopts the penalty relaxation instead of the 
Lagrangian relaxation in [[22l. • 

Remark 4.3: Observe that ^^^{k) > 0, X^^{k) > and v]^\k) G X (due to the fact that X 
is convex). Furthermore, {\g{vx\k))]^ , \h{vx\k))\) is a supgradient of v}'^h (^^(A;)); i.e. the 
following penalty supgradient inequality holds for any /i G M>o and A G M>o: 

Mvmrfii^ - + \Hv!Kk)nx - vf{k)) 

>H^\vf{k),^,A)-'H^'\vf{k)M;}{k),vf{k)). (11) 

DRAFT 



17 



Remark 4.4: A step-size sequence that satisfies the step-size assumption 14.11 is the har- 
monic series \a(k) = T-rrlfeFZ-.n- It is obvious that hm — = 0, and well-known that 
^k=o k+i ^ ^k=o (fc+i)2 < +00. We now proceed to check the property of ^ lim a{k + l)s{k) = 0. 

For any k > 1, there is an integer n > 1 such that 2"^^ < A; < 2". It holds that 



s(k) < s(2") = ! + - + (- + -) + ... + ( 1 + ■■■ + — 



<1 + 1 + H hl = n< log2 A; + 1. 

Then we have limsup < lim — ^ = 0. Obviously, liminf > 0. Then we 

fc-s.+oo k + 2 fc^+oo k + 2 fc^+oo k + 2 

have the property of lim a{k + l)s{k) = 0. Since logg k < (log2 kY < {k + 2)5, then 

fc— >+oo 

2_^a{k + l) s{k) <2_^ -2^l(A; + 2)2 + (A; + 2)2 + (A; + 2)2^ 

+CO ^ +00 2 +°° 

Additionally, we have + l)2s(A;) < + < +oo. • 

V. Convergence analysis 

In this sectiob, we provide the proofs for the main results. Theorem 13.21 and 14.21 of this paper. 
We start our analysis by providing some useful properties of the sequences weighted by {a(k)}. 

Lemma 5.1 (Convergence properties of weighted sequences): Let > 0. Consider the 
sequence {6{k)} defined by 6{k) := where k>K + l, a{k) > and = 

+ 00. 

(a) If lim p{k) = +oo, then lim S{k) = +oo. 

k^+oo k—^+oo 

(b) If Um p{k) = p*, then lim 6{k) = p* . 

fc— >+oo fc— >+oo 

Proof: (a) For any 11 > 0, there exists ki> K such that p{k) > U for all k > ki. Then the 

following holds for all k > ki + 1: 

^ fci— 1 fc— 1 fci— 1 fci— 1 

m > ^,,1 + E "Wn) = n + (E «wpw - E am)- 

l^i=K^\'^) l=K i=ki 1^1=k'^\^) l=K i=K 
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Take the limit on k in the above estimate and we have liminf > 11. Since 11 is arbitrary, 
then Um 5{k) = +oo. 

(b) For any e > 0, there exists k2 > K such that \\p{k) — p*|| < e for all k > k2 + 1. Then 
we have 

Take the limit on k in the above estimate and we have limsup \\5{k) — p*|| < e. Since e is 
arbitrary, then lim ||(5(A;) — p*|| = 0. ■ 

k—^+oo 

A. Proofs of Theorem 13.21 

We now proceed to show Theorem 13.21 To do that, we first rewrite the DLPDS algorithm into 
the following form: 

J'^{k + l)=v]^\k) + e^^{k), ^i^{k + l)=vf{k) + e^^{k), y^'^{k + l) = vf{k)+u^'\k), 
where ex\k) and e^^{k) are projection errors described by 

eW(fc) := Pxi4vlKk) - a{k)Vf{k)] - vf{k), e^j}{k) := PmH^^I^) + c^ik)Vf{k)] - vf{k), 

and mW(A;) := A^(/W(xW(A;)) — /W(xW(A; — 1))) is the local input which allows agent i to track 
the variation of the local objective function /W. In this manner, the update law of each estimate 
is decomposed in two parts: a convex sum to fuse the information of each agent with those 
of its neighbors, plus some local error or input. With this decomposition, all the update laws 
are put into the same form as the dynamic average consensus algorithm in the Appendix. This 
observation allows us to divide the analysis of the DLPDS algorithm in two steps. Firstly, we 
show all the estimates asymptotically achieve consensus by utilizing the property that the local 
errors and inputs are diminishing. Secondly, we further show that the consensus vectors coincide 
with a pair of primal and Lagrangian dual optimal solutions and the optimal value. 

Lemma 5.2 (Lipschitz continuity of £x and Consider and Cx^. Then there are 
L > and > such that ||I?£[!^(x)|| < L and \\V£f{fi)\\ < R for each pair of x G 
co(U,^iXW) and /i G co{ufL^M^^). Furthermore, for each /i G co{ufL^M^^), the function is 
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Lipschitz continuous with Lipschitz constant L over co(U^j^XW), and for each x G co(U^iXW), 
the function Cx'^ is Lipschitz continuous with Lipschitz constant R over co(U^]^MW). 

Proof: Observe that VC^^ = P/W + ^^j^g and VC\^ = g. Since and gi are convex, 
it follows from Proposition 5.4.2 in Q that 9/^ and dge are bounded over the compact 
co{ufL^X^^). Since co(U^iMW) is bounded, so is dC^K i.e., for any fi e co{ufL^M^^), there 
exists L > such that < L for all x G co(U^]^XW). Since (7^ is continuous (due 

to its convexity) and co(U^]^XW) is bounded, then g and thus dCx^ are bounded, i.e., for any 
X G co(U^iXW), there exists R>0 such that ||9£i'^(yu)|| < R for all G co{ug^M^^). 
It follows from the Lagrangian subgradient inequality that 

for any G co(U^]^XW). By using the boundedness of the subdifferentials, the above two 
inequalities give that — iv||a; — x'\\ < C^Ij!{x) — C}]}{x') < L\\x — x'\\. This implies that — 
(^')ll — -^11^ ~ ^'11 for any x,x' G co(U™iXW). The proof for the Lipschitz continuity of 
Cx is analogous by using the Lagrangian supgradient inequality. ■ 
The following lemma provides a basic iteration relation used in the convergence proof for the 
DLPDS algorithm. 

Lemma 5.3 (Basic iteration relation): Let the balanced communication assumption 12.31 and 
the periodic strong connectivity assumption 12.41 hold. For any x E X, any ji G M and all A; > 0, 
the following estimates hold: 

N N N 

j2 \\em + c^ik)vmr < e ^^iknmkw + Y.iw^^^if^) - - + 1) - > 

1=1 i=l i=l 

N 



-J2Mk){CHv}^\k),vf{k))-C^H^,v^^{k))), (12) 

i=l 

N N N 

1=1 i=l 
N 

+ J2Mk){C^Hvl\k),vf{k)) - C^\vm,f^))- (13) 



i=l 1=1 i=l 

N 



1=1 
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Proof: By Lemma W7\\ with Z = M^, z = f{j'(A;) + a(fc)r'Ji'(fc) and y = fiE M, we have 
that for all A; > 

N N N 

i=l i=l i=l 

N N 



i=l i=l 

N N 

+ 2a{k)Vf{kf{vf{k) -fi)-Y + 1) - -"I 

i=l 1=1 
N N 

< E ^i^nvf (k) r+j2 Mk)vf {knvf (k) - 



i=l i=l 

TV N 

+ E - /xf - E + 1) - /xf . (14) 

i=l i=l 

One can show (fT3l) by substituting the following Lagrangian supgradient inequality into ([14)) : 



Similarly, equality (1121) can be shown by using the following Lagrangian subgradient inequality: 
vf{kY{x-vf{k)) < £H(x,t;|:l(A;)) -£[*l(t;i'l(A;),t;ftA;)). ■ 

The following lemma shows that the consensus is asymptotically reached. 

Lemma 5.4 (Achieving consensus): Let the non-degeneracy assumption I2.2[ the balanced 
communication assumption 12.31 and the periodic strong connectivity assumption 12.41 hold. Con- 
sider the sequences of {a;W(A;)}, {yuW(/c)} and {y^^^{k)} of the DLPDS algorithm with the step- 
size sequence {a(k)} satisfying lim a{k) = 0. Then there exist x* G X and /i* G M such that 

fc— >+oo 

lim ||x[^l(A;) = 0, lim ||/i[*l(A;) - ^*|| = for alH G 1/, and lim \\y^^{k) - y^^\k)\\ = 
for all i,j G V. 

Proof: Observe that i§{k) G co{ufL^X^^) and vf,\k) G co(U^iMW)- Then it follows from 
Lemma O that ||I^x^(A;)|| < L. From Lemma [53] it follows that 

N N N 

J2 w^^'^ik + 1) - < E w^^'^w - + E "('^)'^' 

i=l 1=1 i=l 

N 



+ YMmc^Hvm,vf{km + \\c^Hx,vf{m)- as) 



i=l 
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Notice that v]^\k) G co(U,^iXH), vf{k) G co(U^iMH) and x G X are bounded. Since £H is 
continuous, then £W(t>x'(fc), t>|l'(A;)) and are bounded. Since Um a{k) = 0, the 

last two terms on the right-hand side of (fT5l ) converge to zero as k — )■ +00. Taking limits on 

N N 

|2 



both sides of (fT5l) . one can see that limsup 7^ ||a;f*^(A; + 1) — x\\'^ < liminf 7^ ||a;f*^(A;) - 

t=i 1=1 

N 

for any x E X, and thus lim ||a;f*^(A;) — s|p exists for any a; G X. On the other hand, 



fc— ^+00 ■ 

i=l 

N 



taking limits on both sides of (O we obtain lim + = and there- 

i=l 

fore we deduce that lim ||e^*^(/c)|| = for all i E V. It follows from Proposition 19.11 in the 

fe— >+oo 

Appendix that lim — a;'"'^(A;)|| = for all i,j G V. Combining this with the property 

that lim ||a;^*^(/c) — exists for any s G X, we deduce that there exists a;* G M" such that 
lim ||x'*'(A;) —x*\\ =0 for all i E V. Since xW(A;) G X^ and X^ is closed, it implies that 

fc— >+oo 

X* G XW for alH G y and thus x* G X. Similarly, one can show that there is /i* G M such 
that lim \\fi^\k) - =0 for all i G V. 

Since lim ||x'*'(A;) — x*\\ =0 and is continuous, then lim ||m'*'(A;)|| = 0. It follows from 
Proposition [m that lim \\y^^{k) - y^^\k)\\ = for all i,j eV. U 

From Lemma [54l we know that the sequences of {xW(fc)} and {yuW(A;)} of the DLPDS 
algorithm asymptotically agree on to some point in X and some point in M, respectively. Denote 
by 6 C X X M the set of such limit points. We further denote by the average of primal and 
dual estimates x{k) := jj J2iLi ^^^K^) ^^^d p,{k) := ;^ respectively. The following 

lemma further characterizes that the points in 6 are saddle points of the Lagrangian function C 
over X X M. 

Lemma 5.5 (Saddle-point characterization of 6): Each point in 6 is a saddle point of the 
Lagrangian function C over X x M. 
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Proof: Denote by the maximum deviation of primal estimates A^(A;) := maxj ||a;[''^(/c) — 
x^^{k)\\. Notice that 

AT AT 

Denote by the maximum deviation of dual estimates A^(A;) := max, jgy — /xW(A;)||. 

Similarly, we have - A(^)ll < 2A^(fc)- 

We will show this lemma by contradiction. Suppose that there is G which is not a 

saddle point of C over X x M. Then at least one of the following equalities holds: 

3xeX s.t. > (16) 

3/iGM s.t. £(a;*,/i) > (17) 

Suppose first that (fT6l) holds. Then, there exists > such that £(x*,yU*) = + 
Consider the sequences of {^^(A;)} and {/iW(A;)} which converge respectively to x* and /i* 
defined above. Notice that estimate (fT2)) leads to 

TV Af iV 

Wx^'^ik + 1) - a:f < J2 W^^'^i^') - ^11' + "('^)' 5Z 

i=l j=l 1=1 

N 

- 2a{k) J2iMk) + B,{k) + Q{k) + D,{k) + E,{k) + F,{k)), 

i=l 

where 

Aik) := C^^iv!)ik),v^;^ik))-C^\xik),v^^ik)), B,ik) := C^^im^v^^ik)) - C^^m, Kk)). 
Ci{k) := C^^{x{k),fi{k)) - C^^{x*,fi{k)), Di{k) :=£W(x*,/i(A;)) 
Ei{k) :=£W(x*,/i*)-£[^l(x,/i*), F,{k) = C^^{x,fi*) - C^\x,v^^{k)). 
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It follows from the Lipschitz continuity property of see Lemma [5^ that 

WAm < L\\v^\k) - x{k)\\ < 2LA,{k), \\B,{k)\\ < R\\v^^{k) - mW < 2i?A^(A;), 

T ^ 

\\Ci{k)\\ < L\\x{k)-x*\\ < 

i=l 



i=l 

< RWf,* - < RWf,* - m\\ + mm - 

1=1 



Since lim \\x^^{k) - x*\\ = 0, lim ||;u['l(fc) - = 0, lim A^(A;) = and lim A^(fc) = 0, 

fc— >+oo fc— >+oo fe— >+oo fc— >+oo 

then all Aj(A;), Bi{k), Ci{k), Di{k), Fi{k) converge to zero as A; — i- +oo. Then there exists /cq > 
such that for all A; > A;o, it holds that 

N N 

||a;W(A; + 1) - xf < ^ ||a;I'^](A;) - xf + Na{kfL^ - qa{k). 

1=1 i=l 

Following a recursive argument, we have that for all k > ko, it holds that 

N N k k 

1=1 i=l T=ko T=ko 

Since ^^^fco "(^) = +^ and Y.t=ko "(^)^ < +^ ^^d x[^l(A;o) e x e X are bounded, the 
above estimate yields a contradiction by taking k sufficiently large. In other words, (fT6l) cannot 
hold. Following a parallel argument, one can show that (fTTI) cannot hold either. This ensures 
that each (x*, yU*) G 6 is a saddle point of C over X x M. ■ 
The combination of (c) in Lemmas 13.11 and Lemma 15.51 gives that, for each G 0, 

we have that C{x*,n*) = p* and fi* is Lagrangian dual optimal. We still need to verify that 
X* is a primal optimal solution. We are now in the position to show Theorem 13.21 based on the 
following two claims. 
Proofs of Theorem IXIt 

Claim 1: Each point G 6 is a point in X* x D^. 

Proof: The Lagrangian dual optimality of fi* follows from (c) in Lemma [311 and Lemma [531 
To characterize the primal optimality of x*, we define an auxiliary sequence {z{k)} by z{k) : = 
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, which is a weighted version of the average of primal estimates. Since lim x{k) = x*, 
it follows from Lemma [STTI (b) that lim z{k) = x*. 

Since {x*,fi*) is a saddle point of £ over X x M, then C{x*,fi) < C{x*,fi*) for any fi e M; 
i.e., the following relation holds for any /i G M: 

g{x*f{fi-fi*)<0. (18) 

Choose /Zq = /i* + minigy jj^ where > Q is given in the definition of M^. Then /^a > 
and < 11/^* II + minjgv' implying /i„ G Af. Letting /i = /i^ in (fTSi) gives that 

IIa* II 

Since 6'W > 0, we have g{x*)^fi* < 0. On the other hand, we choose /ib = and then fi^ G M. 
Letting fx = fib in (fTSi) gives that —^g(x*)^ii* < and thus g{x*Y jj* > 0. The combination of 
the above two estimates guarantees the property of g{x*)^fi* = 0. 

We now proceed to show g{x*) < by contradiction. Assume that g{x*) < does not hold. 
Denote J+(x*) := {1 < i < m \ gi{x*) > 0} 7^ and r] := mmi^j+(^x*){gi{x*)} . Then t] > 0. 
Since g is continuous and Vx\k) converges to x*, there exists K > such that g£{vx\k)) > | for 
all A; > and all i G J+(x*). Since converges to without loss of generality, we say 

that ||v|l^(A;) -/i*|| < | minjgv for all k > K. Choose fl such that fie = fi} for i ^ J+(s*) and 
fii = H} + -^minifzv 9^^ for £ G J+(x*). Since /i* > and > 0, thus /t > 0. Furthermore, 
||/i|| < ||Ai*|| + minjgy 6^^, then fl G M. Equating fi to fi and letting V^f,\k) = g{vx\k)) in the 
estimate (fT4l) . the following holds for A; > i^: 

iV|J+(x*)|r/min^['']a(A:)<2a(A;)^ ge{vl\k)){fi - vf{k))e 

j=i ieJ+{x*) 

N N 

< J2 Wf^^'^W - fif - J2 y^'^ik + 1) - + NR^a{kf 

i=l i=l 

N 

-2a{k)J2 E 9i{vl\km-vf{k))e. (19) 

i=l l^J+{x*) 
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Summing (fT9l ) over [K, k — I] with k > K + 1, dividing by Yl^T=K^i'^) '^'^ sides, and 
using — Yl!i=i \\^^^^K^) ~ ^ 0' we obtain 

^ AT fc-l 

iV| J+(x*)|r^min^W < - + iV/?2 5^ «(r)2 

Z^r=i^'^i^) i=l r=i<' 

fe-1 N 

-E2«mE E (20) 

T=K i = l tfj+ix") 

Since fi^''^{K) E M^, /i G M are bounded and J2t=K '^i'^) ~ then the limit of the first 
term on the right hand side of (|20|) is zero as k ^ +oo. Since XlSt '^i'^)'^ < +C)0, then the 
limit of the second term is zero as k ^ +oo. Since lim v'^\k) = x* and lim v^i^\k) = ^* , 

N 

thus lim 2 V V gi{vf{k)){fi - t;J.*^(A;))£ = 0. Then it follows from Lemma O (b) that 

then the limit of the third term is zero as A; — )• +oo. Then we have J+(x*)|r7minjgy < 0. 
Recall that |J+(x*)| > 0, r] > and 9^^'^ > 0. Then we reach a contradiction, implying that 

g{x*) < 0. 

Since x* E X and g{x*) < 0, then x* is a feasible solution and thus f{x*) > p*. On the other 
hand, since z(k) is a convex combination of x(0), ■ ■ ■ ,x{k — 1) and / is convex, thus we have 
the following estimate: 

fi<k)) < ^'-Itl^^j'^y^^ = ^J, ^ A J2 «(r)/:(x(r), A(r)) - iV«(r)/i(r)^^(x(r))}. 

Er=0«M Er=0«M r=0 r=0 

Recall the following convergence properties: 

lim z{k) = X*, lim C{x{k), jl{k)) = C{x*,fj*) = p* , lim jl{k)^ g{x{k)) = g{x*)^fi* = 0. 

fc— >+oo fc— >+oo k—^+oo 

It follows from Lemma [STTI (b) that /(x*) < ]?*. Therefore, we have f{x*) = p*, and thus x* is 
a primal optimal point. ■ 
Claim 2: It holds that lim \\y^\k)-p*\\ = 0. 

Proof: The following can be proven by induction on k for a fixed A;' > 1: 

Af TV k N 

Y^yl^k + l) = ^/](A;') + iV J] - 1))). (21) 

j=l i=l £=fc' i=l 

Let fc' = 1 in dH]) and recall that initial state y^^{l) = iV/H(x[*l(0)) for all i E V. Then we have 

N N N N 

5^1/[^1(A; + 1) = Y^y^Hl) + AT J](/H(xW(A;)) - f^^^Ho))) = N f^H^^Hk)) . (22) 

i=l 1=1 i=l i=l 
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The combination of (l22l) with Hm \\y^^\k) — y^\k)\\ = gives the desired result. We then 
finish the proofs of Theorem 13.21 ■ 

B. Proofs of Theorem \4.2\ 

In this part, we present the proofs of Theorem 14. 2[ In order to analyze the DPPDS algorithm, 
we first rewrite it into the following form: 

^^\k + 1) = v^;}{k) + u^^{k), X^\k + 1) = vf{k) + u^^{k), 
x^'^{k + l)=vf{k)+ef{k), y^'^{k + l)=vf{k)+uf{k), 
where e§{k) is projection error described by 

eW) ■■= PxV^ik) - a{k)Sf{k)] - vf{kl 

andMj:l(A;) := a{k)[g{vf{k))]+ , u^^{k) := a{k)\h{vf {k))\, uf {k) = Ar(/H(xH(A;))-/H(xW(A;- 
1))) are some local inputs. Denote by the maximum deviations of dual estimates M^{k) : = 
maxjgy 11/^^*^(^)11 and Mx{k) := maxjgy ll-^'^K^)!!- We further denote by the averages of primal 
and dual estimates x{k) := ^ E.=i ^^Kk), m ■= ^ Eti /"'"(^) and A(A;) := i J^^^ X^%k). 

Before showing Lemma 15. 6[ we present some useful facts. Since X is compact, and /W, 
[(?(■)]+ and h are continuous, there exist F,G^,H > such that for all a; G X, it holds that 
< F for all i G V, ||[^(x)] + || < G+ and \\h{x)\\ < H. Since X is a compact set 
and /W, [5f^(-)]+, \hi{-)\ are convex, then it follows from Proposition 5.4.2 in ^ that there 
exist Df,Dg+,Dh > such that for all a; G X, it holds that \\Vf^^{x)\\ < Dp (i e V), 
m\\V[gi{x)] + \\ < Dg+ (1 < i < m) and u\\V\hi\{x)\\ < Dh (I < i < u). 

Lemma 5.6 (Diminishing and summable properties): Suppose the balanced communica- 
tion assumption 12.31 and the step-size assumption 14.11 hold. 

(a) It holds that lim a{k)M^{k) = 0, lim a{k)Mx{k) = 0, lim a{k)\\Sj^\k)\\ = 0, and 

fc— >+oo fc— ^+oo k—^+oo 

the sequences of {a{kYM'^{k)}, {a{kYMl{k)} and {a{k)^\\sji\k)f} are summable. 

(b) The sequences {a{k)\\fi{k) - vf;^{k)\\}, {a(A;) || A(A;) - vl\k)\\}, {a{k)M^{k)\\x{k) - 
v^\k)\\}, {a{k)Mx{k)\\x{k) - v!^\k)\\} and {a{k)\\x{k) - v!^\k)\\} are summable. 

Proof: (a) Notice that 

N N N 

\\vfm = ||$^a;.(fc)/.yi(fc)|| < ^a}(fc)||/.b1(A:)|| < ^ a}(A:)M,(A;) = M,(fc), 
j=i j=i j=i 
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where in the last equality we use the balanced communication assumption 12.31 Recall that 
Vx\k) G X. This implies that the following inequalities hold for all k > 0: 

+ 1)11 < \\v^^{k) + a{k)[g{vj^\k))] + \\ < \\v^^ik)\\ + G+a{k) < M^{k) + G+a{k). 

From here, then we deduce the following recursive estimate on M^{k + 1): M^{k + 1) < 
M^{k) + G^a{k). Repeatedly applying the above estimates yields that 

M^{k + l)<M^{0) + G+s{k). (23) 
Similar arguments can be employed to show that 

Mx{k + 1) < Ma(0) + Hs{k). (24) 

Since lim a{k + l)s(fc) = and lim a{k) = 0, then we know that lim a{k + l)M^{k + 1) = 

fc— >+oo fc— >+oo fc— >+oo 

and lim a{k + l)Mx{k + 1) = 0. Notice that the following estimate on Sx\k) holds: 

\\Sl\k)\\ <Df + Dc+M^ik) + DnMxik). (25) 
Recall that lim a{k) = 0, lim a{k)M^{k) = and lim a{k)Mx{k) = 0. Then the resuh of 

fc— >+oo fe— >+oo k—^+oo 

lim a{k)\\Sj^\k)\\ = follows. By we have 

k—^+co 

+0O +0O 

J2 m'Mlik) < aiOyMliO) + J2 m\MM + G\s{k - l)f. 

k=Q k=l 

It follows from the step-size assumption 14.11 that ^^=0 ^i^)'^ ^'^(^) ^ +c>o. Similarly, one can 
show that ES"(^)^^a(^) < By using ([23]), ^ and ([25]), we have the following 

estimate: 

+ 00 

J2<^{kf\\Sl\k)f < a{0)\DF + Dg^M,{0) + DnMxm' 

k=0 

+ 00 

+ J2 a{kf{DF + Dg+{M^{0) + G+s{k - 1)) + Dh{Mx{0) + Hs{k - 

k=l 

Then the summability of {a(fc)^}, {a{k + l)'^s{k)} and {a{k + l)^s(A;)^} verifies that of 

{a{kns!i\k)r}. 

(b) Consider the dynamics of ij}-^\k;) which is in the same form as the distributed projected 
subgradient algorithm in ||23l|. Recall that {[g{v'x{k))\^} is uniformly bounded. Then following 
from Lemma [U in the Appendix with Z = W^q and d^^{k) = -[g{vl^\k))]+, we have the 
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summability of {a{k) maxjey IIA(^) — /^'*'(^)||}- Then {a{k)\\fi{k) — is summable by 

using the following set of inequalities: 

N 

\\m-vfm<T.^W\\m-^,^\k)\\<m^x^^^^ (26) 
where we use 

EJLi4(^) = 1- Similarly, it holds that ES«(^)II^(^) - < 
We now consider the evolution of a;W(A;). Recall that Vx\k) G X. By Lemma 1911 with Z = X, 

z = Vx\k) — a{k)Sx\k) and y = vi\k), we have 

\\x^\k+i)-vf{k)f < \\vf{k)-msf{k)-v^^{k)r 

-\\x^\k + l)-{vf{k)-a{k)S^!^{k))\\\ 

and thus ||eL''(A;) + a{k)Sf{k)\\ < With this relation, from Lemma [9j with 

Z = X and d\\k) = s!i\k), the following holds for some 7 > and < /3 < 1: 

(A:) - x{k) II < iV7/3^-i J] ||xW (0) || + 2N^ ^ /3^-^a(r) (r) || . (27) 

i=0 r=0 

Multiplying both sides of (|27|) by a{k)M^{k) and using (|25l) . we obtain 

TV 

a(A;)M^(A:)||xW(A;) - x(A:)|| < A^7 ^ ||a;I'^](0)||a(A;)M^(A;)/3^-i + 2N-ia{k)M^{k) 

X 

T=0 

Notice that the above inequalities hold for all i E V . Then by employing the relation of 
ah < |(a^ + b"^) and regrouping similar terms, we obtain 

N k~l 



a{k)M,{k)m.B^\\x^'^{k) ~ x{k)\\ < iV7(^ 112:^*^0)11 + + + ^h) 

j=0 T=0 

^ AT fc-1 

X a{kfMl{k) + -iV7 J] ||xW(0)||/32(^-^) + iV7 /3'=--a(r)^(D^ + Dg^MUt) + I^HM,2(r)). 



2 

1=0 r=0 

Part (a) gives that {a(fc)^M^(A;)} is summable. Combining this fact with X]r=o /^^ "^ — 

Y^t=o l^^ = '^hen we can say that the first term on the right-hand side in the above estimate is 

summable. It is easy to check that the second term is also summable. It follows from Part (a) that 

lim a{kf{DF + Dc+M^Jk) + DnMHk)) = and {a{kf{DF + Dc+M^Jk) + DnMUk))} 
fc— ^+00 ^ ^ 

is summable. Then Lemma 7 in [23J with 7^ = N'ya{if{DF + DG+M'^{i) + DhMI{£)) ensures 
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that the third term is summable. Therefore, the summability of {a{k)M^{k) maxjgy ||xW(A;) — 
is guaranteed. Following the same lines in (|26|) . one can show the summability of 
{a{k) ( ) 1 1 fi* ' ( /c ) — x ( A; ) 1 1 } . Folio wing analogou s arguments , we have that { a ( ) Mx {k)\\vx\k) — 
x{k)\\} and {a{k)\\vx\k) — x{k)\\} are summable. ■ 



Remark 5.1: In Lemma 15.61 the assumption of all local constraint sets being identical is 
utilized to find an upper bound of the convergence rate of — fi*^(/c)|| to zero. This property 
is crucial to establish the summability of expansions pertaining to \\x{k) —Vx\k)\\ in part (b).» 

The following is a basic iteration relation of the DPPDS algorithm. 

Lemma 5.7 (Basic iteration relation): The following estimates hold for any x e X and 

(/i,A)eM^oxK>o: 

N N 

We^Hk) + a{k)Sji\k)r < Y^aikfUSj^W 



i=l i=l 
N 



-Y,MkmH4Kk),v^^{k),v^;^{k))-'H^\x,v^^ik),vf{k))) 
1=1 

N 



i=l 



N N 



and, 

< j^di/^f-'k^) -f^r- y^Hk + 1) - /if) + J2i\mk) - Af - wx^Hk + 1) - Ain+ 

i=l i=l 
N N 

Y.2aik){n^\v|\k),vf{k),vf{k))-n^\v|\k),^^,\)) + J2<yik)\M 

i=l i=l 

(29) 

Proof: One can finish the proof by following analogous arguments in Lemma 15.31 ■ 
Lemma 5.8 (Achieving consensus): Let us suppose that the non-degeneracy assumption 12. 2[ 
the balanced communication assumption 12.31 and the periodical strong connectivity assump- 
tion [Ml hold. Consider the sequences of {x^^{k)}, {fi^^{k)}, {AW(A;)} and {y^^k)} of the 
distributed penalty primal-dual subgradient algorithm with the step-size sequence {a{k)} and the 
associated {s{k)} satisfying lim a{k) = and lim a{k + l)s{k) = 0. Then there exists x G 
X such that lim \\x^''\k) - x|| = for all i e V. Furthermore, Um ||/i'*'(A;) - /i'-'^(A;)|| = 0, 
lim ||A[*1(A;) - A[^'](A;)|| = and lim \\y^^ik) - y^^\k)\\ = for all i,j e V. 

fc— >+oo fc— >+oo 
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Proof: Similar to (fT4l) . we have 

N N N N 

\\x^\k + 1) - xf < 5^ \\x^\k) - xf + ^a(A;)2||5H(A:)f + Y.'^amsfm\\vf{k) - x\ 



i=l i=l i=l i=l 



Since lim = 0, the proofs of lim ||x^*^(A;) — x|| = for all i G V are analogous 

to those in Lemma [54l The remainder of the proofs can be finished by Proposition 19. 1 1 with the 
properties of lim ^{^'(A;) = 0, lim ^^'(A;) = and lim ^[^^(A;) = (due to lim x'*'(A;) = x 

k^+oo k~^+oo fc— >+oo fc— >+oo 

and /W is continuous). ■ 

We now proceed to show Theorem 14.21 based on five claims. 
Proof of Theorem 1421 

Claim 1: For any x* e X* and (fi*, A*) G D*p, the sequences of {a{k) [ J^Zi ^^(x*, 4*^^))- 
H{x%jl{k),X{k))]} and {a{k)[^f^^n^^{v]i\k), fi* , X*) - H{x{k), fi* , X*)]} are summable. 
Proof: Observe that 

\0Hx\v^^ik),v^;^ik))-n^Hx\m,m)\\ 

< h^^ik) - m\\\Mx*r\\ + - A(fc)iiiiM^*)ii 

< G^Wv^^ik) - m\\ + HWv^i^ik) - A(A;)||. (30) 

By using the summability of {a{k)\\ix{k) - vi\k)\\} and {a{k)\\X{k) - 4*^(^)11} in Part (b) 
of Lemma [5Zl we have that {a{k)^f^^ \\n^^{x* ,v^f!^{k),vl\k)) - U^^x* , jl{k),X{k))\\} and 
thus {a{k)[Y,ti ('HW(x*,41(A;),fP(^))-'HW(x*,/i(A;), A(A;)))]} are summable. Similarly, the 
following estimates hold: 

||?^W(^;W(A:),/z^A*)-HH(x(A;),/.^A*)|| < ||/H(t;H(A;))-/H(^ 

+ \\if^*n[9ivmr - igimn + wi^f mv!Kk))\ - iHrnm 

< {D, + Dc+ll/ill + DHm\)\\vf{k) - x{k)l 

Then the property of Yll=o^ik)\\x{k) — t>x'(A;)|| < +oo in Part (b) of Lemma [5^ implies the 
summability of the sequence {a{k) J^^i \\n^^ ivlr;\k) , fi* , X*) -n^^{x{k), fx* , X*)\\} and thus the 
sequence {«(A;) E.=i {H^Kv!r\k), fi* , X*) - H^^{x{k), fi* , X*))}. U 
Claim 2: Denote the weighted version of the local penalty function 7/'*' over [0, A; — 1] as 



:= ^7:^E«(£)HW(4^1(£),4](£),41(^))- 
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N 

The following property holds: lim W'^\k) = p*. 

fc— s>+oo 

i=l 

Proof: Summing (|28] ) over [0, /c — 1] and replacing x by x* G X* leads to 

fc-l N 

e=o 1=1 

N k-1 N 

i=l 1=0 i=l 

The summability of {a(/c)^||5i?^(A;)||^} in Part (b) of Lemma 15.61 implies that the right-hand side 
of (|3TI) is finite as A; — )■ +oo, and thus 

limsup— -— E«WE (^'"'(^'^W'^Il'W'^^'W) < 0- (32) 

Pick any (yU*,A*) G Z^p. It follows from Theorem |4~n that (x*,/x*,A*) is a saddle point of 
n over X X X M^q. Since {fi{k),X{k)) G M^'q x M^Q' then we have n{x* , fi{k) , X{k)) < 
H^x* , jj* , \*) = p*. Combining this relation, Claim 1 and (|32l) renders that 

^ fe-l N 

Hm sup — -— E « (^) [ E ^ (^^^^ (^) ' (^) ' (^) ) - ^1 

^ fc-l Af 

< lim sup — -— E « w [ E (^^^' w ' (^) ' w ) - ^ ^^^^ 

, fc-l N 



+ limsup-— — E"W[E^'^^(^*'4^'W'^iW)-^(^*'/^W'^(^))] 
^ fc-i 

+ lim sup -^^-^ E(^(^*"^(^)' ^(^)) - P*) ^ 0' 



and thus limsupfc^^^ ^JI^ < p*. 

On the other hand, x{k) G X (due to the fact that X is convex) implies that H{x{k), fi*, A*) > 
'H{x*,fi*, A*) = p*. Along similar lines, by using (|29l ) with fi = fi*, X = A*, and Claim 1, we have 
the following estimate: liminffc_>+oo J2f=i ^'*'(^) > P*. Then we have the desired relation. ■ 
Claim 3: Denote by n{k) := Y!^^^n^'^{v'§{k),vf{k),vf{k))-'H{x{k),fi{k),X{k)). And we 

denote the weighted version of the global penalty function l-L over [0, /c — 1] as 

^ fc-i 

^^'^ ~ ^> 1=0 
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The following property holds: lim i-i{k) =p*. 

fc— >+oo 

Proof: Notice that 

N N 

1=1 1=1 

N N 

+ E i^fiknaimr - Kkfigimr) + E i^fikriKviKm - vf{kr\h{x{k))\) 

i=l i=l 

N 



+ E {vfikf\Hm)\ - Hkf\h{m)\)- (33) 

i=l 

By using the boundedness of subdifferentials and the primal estimates, it follows from (l33l) that 

N 

ik)\\ < {Dp + DG+M^{k) + DnMxik)) x E ll^'^^) - 



N 

ivr(':;; 1 : ^„ : 

i=l 

N N 



+ E 11^? (^) - '^(^) II + ^ E 11^? (^) - ^(^) II • (34) 

1=1 1=1 

Then it follows from (b) in Lemma [531 that {a(A;)||7r(A;)||} is summable. Notice that ||'H(/c) — 

N 

T.ti^^^ik)\\ < and thus lim \\n{k) -J^'^^^WW = 0- The desired resuh 

i=l 

immediately follows from Claim 2. ■ 
Claim 4: The limit point x in Lemma 15.81 is a primal optimal solution. 

Proof: Let jl{k) = (/ti(/c), ■ ■ ■ ^jljn{k)Y' E ]R>o. By the balanced communication assump- 
tion 12.31 we obtain 

N N N N 

Y^f^^k+i) = EE«K^W''^(^) +«(^)E[^(^i"(^))]^ 

1=1 i=l j=l i=l 

N N 

= Y.f^^Hk) + a{k)Y,[9ivmr. 

j=l i=l 

This implies that the sequence {jli{k)} is non-decreasing in R>o. Observe that {jli{k)} is lower 
bounded by zero. In this way, we distinguish the following two cases: 

Case 1: The sequence {fi^ik)} is upper bounded. Then {jli{k)} is convergent in ]R>o. Recall 
that lim ||/i'*'(A;) - ^J|^^^\k)\\ = for all i,j G V. This implies that there exists fx} e R>o such 

k—^+oo 

that lim \\fif{k) - = for all t E V. Observe that E,=i/"'''(^ + 1) = Er=i/"'''(0) + 
'Er=o^i^)^f=i[9iv^^\^))V- Thus, we have J2t=o^if^)^f=i[9ii^^^\k))]+ < +oo, implying 
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thatliminffc_,+oo[^frf(^))]^ = 0- Since lim \\x^^{k) - x|| = for alH G V, then lim \\v]^\k) 
and thus [g£{x)]^ = 0. 

Case 2: The sequence {jli{k)} is not upper bounded. Since {jli{k)} is non-decreasing, then 
jl£{k) — 7- +00. It follows from Claim 3 and (a) in Lemma 15.11 that it is impossible that 
T-L{x{k), fi{k),X{k)) —7- +00. Assume that [(7^(0;)]+ > 0. Then we have 

V,{x{k),fi{k),X{k)) = f{x{k)) + Nfi{kf[g{xik))]+ + NX{kf\h{x{k))\ 

>f{m) + Mk)[ge{mT. (35) 

Taking limits on both sides of (|35l) and we obtain: 



liminf ^^(^(A;), /i(A;), A(/c)) >\imsup{f{x{k))+fii{k)[gi{x{k))]^) = +00. 

k^+00 fc-s.+oo 

Then we reach a contradiction, implying that [(^^(x)]^ = 0. 

In both cases, we have [gi{x)]'^ = for any 1 < £ < m. By utilizing similar arguments, we 
can further prove that \h{x) \ = 0. Since x E X, then x is feasible and thus f(x) > p*. On the 
other hand, since ^^k-^^^'^^^Jf'' is a convex combination of £(0), ■ ■ ■ , x{k — l) and lim x{k) = x, 
then Claim 3 and (b) in Lemma 15.11 implies that 

f ^ Urn m ^ nm ^rj^mimmMi)) , . 
Hence, we have /(x) = p* and thus x G X*. ■ 



Claim 5: It holds that lim \\y^'^{k) -p*\\ = 0. 

Proof: The proof follows the same lines in Claim 2 of Theorem 13.21 and thus omitted here. 



VI. Discussion 

In this section, we present some possible extensions and interesting special cases. 



A. Discussion on the periodic strong connectivity assumption in Theorem \ 

In the case that G{k) is undirected, then the periodic strong connectivity assumption 12.41 in 
Theorem 13.21 can be weakened into: 

Assumption 6.1 (Eventual strong connectivity): The undirected graph (V, Lik>sE{k)) is con- 
nected for all time instant s > 0. 
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If G{k) is undirected, the periodic connectivity assumption 12.41 in Theorem 13.21 can also be 
replaced with the assumption in Proposition 2 of [Hill; i-C-, for any time instant A; > 0, there is 
an agent connected to all other agents in the undirected graph {V,Uk>sE{k)). 

B. A generalized step-size scheme 

The step-size scheme in the DLPDS algorithm can be slightly generalized the case that the 
maximum deviation of step-sizes between agents at each time is not large. It is formally stated 
as follows: lim a^^{k) = 0, = +0°' ES«'''('^)^ < +oo, mm,^va^^{k) > 

Ca maxjgy q;'*1(/c), where a^'^\k) is the step-size of agent i at time k and Ca G (0, 1]. 

C. Discussion on the Slater's condition in Theorem \4.2\ 

If S'^ (1 < ^ < is linear, then the Slater's condition 12. II can be weakened to the following: 
there exists a relative interior point x of X such that h(x) = and g(x) < 0. For this case, 
the strong duality and the non-emptyness of the penalty dual optimal set can be ensured by 
replacing Proposition 5.3.5 [3] with Proposition 5.3.4 |l3l in the proofs of Lemma 14.11 In this 
way, the convergence results of the DPPDS algorithm still hold for the case of linear g^. 

D. The special case in the absence of inequality and equality constraints 
The following special case of problem (dJ is studied in [|23ll : 

N 

minV/[*l(x), s.t. xea^iXH. (36) 

i=l 

In order to solve problem (l36l) . we consider the following Distributed Primal Subgradient 
Algorithm which is a special case of the DLPDS algorithm: 

J^{k + 1) = P^H [vl^\k) - a{k)Vf^Hv]^\k))]. 

Corollary 6.1 (Convergence properties of the distributed primal subgradient algorithm): 

Consider problem (l36l) . and let the non-degeneracy assumption 12.21 the balanced communication 
assumption 12.31 and the periodic strong connectivity assumption 12.41 hold. Consider the sequence 
{x^^^{k)} of the distributed primal subgradient algorithm with initial states 2;W(0) G and the 

+ 00 +00 

step-sizes satisfying lim a{k) = 0, ^a(A;) = +oo, and ^a(A;)^ < +oo. Then there exists 

fc=o fc=0 

an optimal solution x* such that lim — = alH G V. 

Proof: The result is an immediate consequence of Theorem 13.21 with g{x) = 0. ■ 
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VII. Numerical examples 

In this section, we illustrate the performance of the DLPDS and DPPDS algorithms via two 
numerical examples. 

A. A numerical example of NUM for the DLPDS algorithm 

In order to study the performance of the DLPDS algorithm, we here consider a numerical 
example of network utility maximization, e.g., in IfTSl . Consider five agents and one link where 
each agent sends data through the link at a rate of Zi, and the link capacity is 5. The global 
decision vector x := [zi - ■ ■ z^^ is the resource allocation vector. Each agent i is associated a 
concave utility function := ^/zi, representing the utility agent i obtains through sending 

data at a rate of Zi. Agents aim to maximize the aggregate sum of local utilities and this problem 
can be formulated as follows: 



XW := [0.5, 5.5] x [0.5, 5.5] x [0.5, 5.5] x [0.5, 5.5] x [0.5, 5.5], 

Xl^] := [0.55, 5.25] x [0.55, 5.25] x [0.55, 5.25] x [0.55, 5.25] x [0.55, 5.25], 

X[3] := [0.5, 6] X [0.5, 6] x [0.5, 6] x [0.5, 6] x [0.5, 6], 

XW := [0.5, 5] X [0.5, 5] x [0.5, 5] x [0.5, 5] x [0.5, 5], 

Xt^l := [0.525, 5.75] x [0.525, 5.75] x [0.525, 5.75] x [0.525, 5.75] x [0.525, 5.75]. 

We use the DLPDS algorithm to solve problem (|37] ) by choosing step-size a{k) = -j^. Figures [U 
to \5\ shows the simulation results of the DLPDS algorithm in comparison with the centralized 
subgradient algorithm. It demonstrates that all the agents takes lO'' iterates to agree upon the 
optimal solution [1111 1]^. Furthermore, it can be observed that the optimal solution can be 
found by the centralized subgradient algorithm with the same step- size after 200 iterates which 
is much less than that of the DLPDS algorithm. 




Zi + Z2 + Z3 + + < 5, X e rijgv/X'*', 



(37) 



where local constraint sets X^*^ are given by: 



DRAFT 



36 



B. A numerical example for the DPPDS algorithm 

Consider a network with five agents and their objective functions are defined as 

/W (x) := i ((a - hf + (6 - 2.hf + (c - 5)^ + {d + 2.hf + (e + 5)^) , 
5 

(a;) := 1 ((a - 2.5)^ + (6 - 5)^ + (c + 2.5)^ + (d + 5)^ + (e - 5)^) , 
5 

(a;) := \ Ua - hf + (6 + 2.hf + (c + 5)^ + (t/ - 5)^ + (e - 2.5)^) , 
5 

:= 1 ((a + 2.hf + (6 + 5)^ + (c - 5)' + {d - 2.hf + (e - 5)^), 
5 

(a;) := i ((a + 5)^ + (6 - 5)^ + (c - 2.hf + (d - + (e + 2.5)2) , 
5 

where the global decision vector x := [a b c d e]^ G M^. The global equality constraint 
function is given by h{x) ■.= a + b + c + d + e — 5, and the global constraint set is as follows: 
X := [—5 5] X [—5 5] x [—5 5] x [—5 5] x [—5 5]. Consider the optimization problem as 
follows: 

minY^/Wfa;), s.t. h(x) = 0, x e X. 

We employ the DPPDS algorithm to solve the above optimization problem with the step-size 
a{k) = -j^. Its simulation results are included in Figures [6] to \T0\ in comparison with the 
performance of the centralized subgradient algorithm. Observe that all the agents asymptotically 
achieve the optimal solution [1 1 1 1 1]^. Like the DLPDS algorithm, convergence rate of 
the DPPDS algorithm is slower than the centralized algorithm. 

VIII. Conclusion 

We have studied a multi-agent optimization problem where the agents aim to minimize a sum 
of local objective functions subject to a global inequality constraint, a global equality constraint 
and a global constraint set defined as the intersection of local constraint sets. We have considered 
two cases: the first one in the absence of the equality constraint and the second one with identical 
local constraint sets. To address these cases, we have introduced two distributed subgradient 
algorithms which are based on Lagrangian and penalty primal-dual methods, respectively. These 
two algorithms were shown to asymptotically converge to primal solutions and optimal values. 
Two numerical examples were presented to demonstrate the performance our algorithms. Our 
future work includes explicit characterization of convergence rates of the algorithms in this paper. 
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IX. Appendix 

A. Dynamic average consensus algorithms 

The following is the vector version of the first-order dynamic average consensus algorithm 
proposed in [35] with x^\k),^^\k) G M": 

N 

+ = ^a}(A;)x[^l(A;) + (38) 

Proposition 9.1: Denote AC,e{k) := maxjgy — minjgy for 1 < £ < n. Let the 

non-degeneracy assumption I2.2[ the balanced communication assumption 12.31 and the periodic 
strong connectivity assumption 12.41 hold. Assume that lim A$^i(k) = for all 1 < £ < n and 
all A; >0. Then lim \\x^^{k) - x^^\k)\\ = foi all i,j e V. 

B. A property of projection operators 

The proof of the following lemma can be found in |l3l, JH and ||23l . 

Lemma 9.1: Let Z he a non-empty, closed and convex set in M". For any z E M", the 
following holds for any y E Z: \\Pz[z\ — < \\z — y\\^ — \\Pz[z] — zW^. 

C. Some properties of the distributed projected subgradient algorithm in / [25l/ 

Consider the following distributed projected subgradient algorithm proposed in [|23l : + 
1) = Pz[vf{k) - a{k)d\^{k)]. Denote by e^^{k) := Pz[vf{k) - a{k)(f^{k)] - vl\k). The 
following is a slight modification of Lemma 8 and its proof in fTSj. 

Lemma 9.2: Let the non-degeneracy assumption 12.21 the balanced communication assump- 
tion 12.31 and the periodic strong connectivity assumption 12.41 hold. Suppose Z E M" is a closed 
and convex set. Then there exist 7 > and (3 E (0, 1) such that 

k-l N 

Wx^'^ik) - x{k)\\ < Ar7^/3'=-^{a(r)||rf['l(r)|| + ||eW(r) + a(r)ciW(r)||} + N^p''-^ ^ Wx^'^iOMl- 

T=0 i=0 

Suppose {(iW(A;)} is uniformly bounded for each i eV, and < +00, then we have 

^^^Q a{k) maxi(zv \\x^'-^{k) - x{k)\\ < +00. 
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Fig. 1. Estimates of variable z\ of centralized algorithm and the DLPDS algorithm 
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Fig. 2. Estimates of variable Z2 of centralized algorithm and the DLPDS algorithm 
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Fig. 4. Estimates of variable Zi of centralized algorithm and the DLPDS algorithm 
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Fig. 5. Estimates of variable 25 of centralized algorithm and the DLPDS algorithm 
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Fig. 6. Estimates of variable a in the DPPDS algorithm 



estimates of centralized algorithm for varibale b 



t 



E 



e 



1000 2000 3000 4000 5000 6000 



estimates of agent 2 for variable b 



estimates of agent 4 for variable b 



estimates of agent 1 for variable b 



I 

3 I 



ei 

E 

3 I 



ei 

IF 

3 I 



estimates of agent 3 for variable b 



estimates of agent 5 for variable b 



Fig. 7. Estimates of variable b in the DPPDS algorithm 
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Fig. 8. Estimates of variable c in the DPPDS algorithm 
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Fig. 9. Estimates of variable d in the DPPDS algorithm 
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Fig. 10. Estimates of variable e in the DPPDS algorithm 
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